Authoritative Citation KNN Learning with Noisy Training Datasets

نویسندگان

  • Joseph Bernadt
  • Leen-Kiat Soh
چکیده

In this paper, we investigate the effectiveness of Citation K-Nearest Neighbors (KNN) learning with noisy training datasets. We devise an authority measure associated with each training instance that changes based on the outcome of Citation KNN classification. The authority is increased when a citer’s classification had been right; and vice versa. We show that by modifying only these authority measures, the classification accuracy of Citation KNN improves significantly in a variety of datasets with different noise levels. We also identify the general characteristics of a dataset that affect the improvement percentages. We conclude that the new algorithm is able to regulate the roles of good and noisy training instances using a very simple authority measure to improve classification.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Appraise of KNN to the Perfection

K-Nearest Neighbor (KNN) is highly efficient classification algorithm due to its key features like: very easy to use, requires low training time, robust to noisy training data, easy to implement. However, it also has some shortcomings like high computational complexity, large memory requirement for large training datasets, curse of dimensionality and equal weights given to all attributes. Many ...

متن کامل

An Effective Approach for Robust Metric Learning in the Presence of Label Noise

Many algorithms in machine learning, pattern recognition, and data mining are based on a similarity/distance measure. For example, the kNN classifier and clustering algorithms such as k-means require a similarity/distance function. Also, in Content-Based Information Retrieval (CBIR) systems, we need to rank the retrieved objects based on the similarity to the query. As generic measures such as ...

متن کامل

k-Nearest Neighbor Augmented Neural Networks for Text Classification

In recent years, many deep-learning based models are proposed for text classification. This kind of models well fits the training set from the statistical point of view. However, it lacks the capacity of utilizing instance-level information from individual instances in the training set. In this work, we propose to enhance neural network models by allowing them to leverage information from k-nea...

متن کامل

Study of Different Multi-instance Learning kNN Algorithms

Because of it is applicability in various field, multi-instance learning or multi-instance problem becoming more popular in machine learning research field. Different from supervised learning, multi-instance learning related to the problem of classifying an unknown bag into positive or negative label such that labels of instances of bags are ambiguous. This paper uses and study three different ...

متن کامل

Joint Optimization Framework for Learning with Noisy Labels

Deep neural networks (DNNs) trained on large-scale datasets have exhibited significant performance in image classification. Many large-scale datasets are collected from websites, however they tend to contain inaccurate labels that are termed as noisy labels. Training on such noisy labeled datasets causes performance degradation because DNNs easily overfit to noisy labels. To overcome this probl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004